Supervised Learning Approaches for Automatic Structuring of Videos

نویسندگان

  • Danila Potapov
  • Danila POTAPOV
  • Zaid HARCHAOUI
  • Patrick PEREZ
  • Ivan LAPTEV
  • Florent PERRONNIN
  • Matthijs DOUZE
  • Zaid Harchaoui
  • Cordelia Schmid
چکیده

Automatic interpretation and understanding of videos still remains at the frontier of computer vision. The core challenge is to lift the expressive power of the current visual features (as well as features from other modalities, such as audio or text) to be able to automatically recognize typical video sections, with low temporal saliency yet high semantic expression. Examples of such long events include video sections where someone is fishing (TRECVID Multimedia Event Detection), or where the hero argues with a villain in a Hollywood action movie (Action Movie Franchises). In this manuscript, we present several contributions towards this goal, focusing on three video analysis tasks: summarization, classification, localization. First, we propose an automatic video summarization method, yielding a short and highly informative video summary of potentially long videos, tailored for specified categories of videos. We also introduce a new dataset for evaluation of video summarization methods, called MED-Summaries, which contains complete importance-scoring annotations of the videos, along with a complete set of evaluation tools. Second, we introduce a new dataset, called Action Movie Franchises, consisting of long movies, and annotated with non-exclusive semantic categories (called beat-categories), whose definition is broad enough to cover most of the movie footage. Categories such as “pursuit” or “romance” in action movies are examples of beat-categories. We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots. Third, we overview the Inria event classification system developed within the TRECVID Multimedia Event Detection competition and highlight the contributions made during the work on this thesis from 2011 to 2014.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos Supplementary material

Table 1 shows the per video results for the 50 videos from the DAVIS dataset (referred in Table 1 of the main paper). We compare with several semi-supervised and fully automatic baselines. Our method outperforms the per-video best fully automatic and best semi-supervised baseline in 25 out of 50 videos. Table 2 shows the per video results for the 14 videos from the Segtrack-v2 dataset (referred...

متن کامل

PKU-ICST at TRECVID 2012: Instance Search Task

We participate in all two types of instance search task in TRECVID 2012: automatic search and interactive search. This paper presents our approaches and results. In this task, we mainly focus on exploring the effective feature representation, feature matching, re-ranking algorithm and query expansion. In feature representation, we adopt two basic visual features and five keypoint-based BoW feat...

متن کامل

Weakly supervised learning from images and videos∗

With the amount of on-line available digital content growing daily, large-scale, weakly supervised learning is becoming more and more important. In this talk we present some recent results for weakly supervised learning from images and videos. Standard approaches to object category localization require bounding box annotations of object instances. This time-consuming annotation process is sides...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015